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Abstract 

This  paper  addresses  the  issue  of  word- 
sense  ambiguity  in  extraction  from 
machine-readable  resources  for  the  con¬ 
struction  of  large-scale  knowledge  sources. 

We  describe  two  experiments:  one  which 
took  word-sense  distinctions  into  account, 
resulting  in  97.9%  accuracy  for  seman¬ 
tic  classification  of  verbs  based  on  (Levin, 

1993);  and  one  which  ignored  word-sense 
distinctions,  resulting  in  6.3%  accuracy. 

These  experiments  were  dual  purpose:  (1) 
to  validate  the  central  thesis  of  the  work 
of  (Levin,  1993),  i.e. ,  that  verb  semantics 
and  syntactic  behavior  are  predictably  re¬ 
lated;  (2)  to  demonstrate  that  a  20-fold 
improvement  can  be  achieved  in  deriving 
semantic  information  from  syntactic  cues 
if  we  first  divide  the  syntactic  cues  into 
distinct  groupings  that  correlate  with  dif¬ 
ferent  word  senses.  Finally,  we  show  that 
we  can  provide  effective  acquisition  tech¬ 
niques  for  novel  word  senses  using  a  com¬ 
bination  of  online  sources. 

1  Introduction 

This  paper  addresses  the  issue  of  word-sense  ambi¬ 
guity  in  extraction  from  machine-readable  resources 
for  the  construction  of  large-scale  knowledge  sources. 

We  describe  two  experiments:  one  which  took  word- 
sense  distinctions  into  account,  resulting  in  97.9% 
accuracy  for  semantic  classification  of  verbs  based 
on  (Levin,  1993);  and  one  which  ignored  word-sense 
distinctions,  resulting  in  6.3%  accuracy.  These  ex¬ 
periments  were  dual  purpose:  (1)  to  validate  the  cen¬ 
tral  thesis  of  the  work  of  (Levin,  1993),  i.e.,  that  verb 
semantics  and  syntactic  behavior  are  predictably  re¬ 
lated;  (2)  to  demonstrate  that  a  20-fold  improvement 
can  be  achieved  in  deriving  semantic  information 
from  syntactic  cues  if  we  first  divide  the  syntactic 
cues  into  distinct  groupings  that  correlate  with  dif¬ 
ferent  word  senses.  Finally,  we  show  that  we  can  pro¬ 
vide  effective  acquisition  techniques  for  novel  word 
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senses  using  a  combination  of  online  sources,  in  par¬ 
ticular,  Longman’s  Dictionary  of  Contemporary  En¬ 
glish  (LDOCE)  (Procter,  1978),  Levin’s  verb  classifi¬ 
cation  scheme  (Levin,  1993),  and  WordNet  (Miller, 

1985) .  We  have  used  these  techniques  to  build  a 
database  of  10,000  English  verb  entries  containing 
semantic  information  that  we  are  currently  porting 
into  languages  such  as  Arabic,  Spanish,  and  Korean 
for  multilingual  NLP  tasks  such  as  foreign  language 
tutoring  and  machine  translation. 

2  Automatic  Lexical  Acquisition  for 
NLP  Tasks 

As  machine-readable  resources  (i.e.,  online  dictio¬ 
naries,  thesauri,  and  other  knowledge  sources)  be¬ 
come  readily  available  to  NLP  researchers,  auto¬ 
mated  acquisition  has  become  increasingly  more  at¬ 
tractive.  Several  researchers  have  noted  that  the  av¬ 
erage  time  needed  to  construct  a  lexical  entry  can  be 
as  much  as  30  minutes  (see,  e.g.,  (Neff  and  McCord, 
1990;  Copestake  et  ah,  1995;  Walker  and  Amsler, 

1986) ).  Given  that  we  are  aiming  for  large-scale  lex¬ 
icons  of  20-60,000  words,  automation  of  the  acquisi¬ 
tion  process  has  become  a  necessity. 

Previous  research  in  automatic  acquisition  focuses 
primarily  on  the  use  of  statistical  techniques,  such  as 
bilingual  alignment  (Church  and  Hanks,  1990;  Kla- 
vans  and  Tzoukermann,  1996;  Wu  and  Xia,  1995), 
or  extraction  of  syntactic  constructions  from  online 
dictionaries  and  corpora  (Brent,  1993;  Dorr  et  ah, 
1995).  In  such  cases,  the  objective  is  typically  to 
build  a  large  set  of  translation  equivalences  between 
words  and  phrases,  e.g.,  for  transfer  MT.  Others 
who  have  taken  a  more  knowledge-based  (interlin¬ 
gual)  approach  (Lonsdale  et  ah,  1996)  do  not  pro¬ 
vide  a  means  for  systematically  deriving  the  relation 
between  surface  syntactic  structures  and  their  un¬ 
derlying  semantic  representations.  Such  approaches 
tend  to  ignore  the  wide  range  argument  structures 
(beyond  intransitive  and  transitive)  that  could  po¬ 
tentially  be  associated  with  verbs.  Those  who  have 
taken  more  sophisticated  argument  structures  into 
account,  e.g.,  (Copestake  et  ah,  1995),  do  not  take 
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full  advantage  of  the  systematic  relation  between 
syntax  and  semantics  during  the  lexical  acquisition 
stage.  Our  own  approach  exploits  certain  linguistic 
constraints  that  govern  the  relation  between  syntac¬ 
tic  structure  and  word  meaning.  We  demonstrate 
that  verb  meaning  can  be  systematically  derived 
from  information  about  syntactic  realizations;  these 
meaning  components  are  used  to  build  verb  entries 
which  are  then  ported  into  different  languages. 

3  Syntax-Semantics  Relation:  Verb 
Classification  Based  on  Syntactic 
Behavior 

The  central  thesis  of  (Levin,  1993)  is  that  the  se¬ 
mantics  of  a  verb  and  its  syntactic  behavior  are  pre¬ 
dictably  related.  As  a  demonstration  that  such  pre¬ 
dictable  relationships  are  not  confined  to  an  insignif¬ 
icant  portion  of  the  vocabulary,  Levin  surveys  4183 
verbs,  grouped  into  191  semantic  classes  in  Part  Two 
of  her  book.  The  syntactic  behavior  of  these  classes 
is  illustrated  with  1668  example  sentences,  an  aver¬ 
age  of  8  sentences  per  class. 

Given  the  scope  of  Levin’s  work,  it  is  not  easy 
to  verify  the  central  thesis.  To  this  end,  we  cre¬ 
ated  a  database  of  Levin’s  verb  classes  and  example 
sentences  from  each  class,  and  wrote  a  parser  to  ex¬ 
tract  basic  syntactic  patterns  from  the  sentences.1 
We  then  characterized  each  semantic  class  by  a  set 
of  syntactic  patterns,  which  we  call  a  syntactic  sig¬ 
nature,  and  used  the  resulting  database  as  the  ba¬ 
sis  of  two  experiments,  both  designed  to  to  discover 
whether  the  syntactic  signatures  tell  us  anything 
about  the  meaning  of  the  verbs.2  The  first  exper¬ 
iment,  which  we  label  Class-Based,  implicitly  takes 
word-sense  distinctions  into  account  by  considering 
each  occurrence  of  a  verb  individually  and  assign¬ 
ing  it  a  single  syntactic  signature  according  to  class 
membership.  The  second  experiment,  which  we  la¬ 
bel  Verb-Based,  ignores  word-sense  distinctions  by 
assigning  one  syntactic  signature  to  each  verb,  re¬ 
gardless  of  whether  it  occurred  in  multiple  classes. 

The  remainder  of  this  section  describes  the  assign¬ 
ment  of  signatures  to  semantic  classes  and  the  two 
experiments  for  determining  the  relation  of  syntac¬ 
tic  information  to  semantic  classes.  We  will  see  that 
our  classification  technique  shows  a  20-fold  improve¬ 
ment  in  the  experiment  where  we  implicitly  account 

1Both  the  database  and  the  parser  are  encoded  in 
Quintus  Prolog. 

2  The  design  of  this  experiment  is  inspired  by  the  work 
of  (Dubois  and  Saint-Dizier.,  f995).  In  particular,  we 
depart  from  the  alternation-based  data  in  (Levin,  1993) , 
which  is  primarily  binary  in  that  sentences  are  presented 
in  pairs  which  constitute  an  alternation.  Following  Saint- 
Dizier’s  work,  we  construct  N-ary  syntactic  characteri¬ 
zations.  The  choice  is  of  no  empirical  consequence,  but 
it  simplifies  the  experiment  by  eliminating  the  problem 
of  naming  the  syntactic  patterns. 


for  word-sense  distinctions. 

3.1  Assignment  of  Signatures  to  Semantic 
Classes 

In  order  to  assign  signatures  to  semantic  classes, 
we  first  needed  to  decide  what  syntactic  informa¬ 
tion  to  extract.  It  turns  out  that  a  very  simple 
strategy  works  very  well,  namely,  flat  parses  that 
contain  lists  of  the  major  categories  in  the  sen¬ 
tence,  the  verb,  and  a  handful  of  other  elements. 
The  “parse”,  then,  for  the  sentence  Tony  broke 
the  crystal  vase  is  simply  the  syntactic  pattern 
[np,v,np].  For  Tony  broke  the  vase  to  pieces 
we  get  [np,  v,np,pp(to)]  .  Notice  that  the  pp  nodes 
is  marked  with  its  head  preposition.  Figure  1  shows 
an  example  class,  the  break  subclass  of  the  Change 
of  State  verbs  (45.1),  along  with  example  sentences 
and  the  derived  syntactic  signature  based  on  sen¬ 
tence  patterns.  Positive  example  sentences  are  de¬ 
noted  by  the  number  1  in  the  sentence  patterns  and 
negative  example  sentences  are  denoted  by  the  num¬ 
ber  0  (corresponding  to  sentences  marked  with  a  *). 

Verbs:  break,  chip,  crack,  crash, 

crush,  fracture,  rip,  shatter,  smash, 
snap,  splinter,  split,  tear 

Example  Sentences: 

Crystal  vases  break  easily. 

The  hammer  broke  the  window. 

The  window  broke. 

Tony  broke  her  arm. 

Tony  broke  his  finger. 

Tony  broke  the  crystal  vase. 

Tony  broke  the  cup  against  the  wall. 

Tony  broke  the  glass  to  pieces. 

Tony  broke  the  piggy  bank  open. 

Tony  broke  the  window  with  a  hammer. 

Tony  broke  the  window. 

*  Tony  broke  at  the  window. 

*  Tony  broke  herself  on  the  arm. 

*  Tony  broke  himself. 

*  Tony  broke  the  wall  with  the  cup. 

A  break. 

Derived  Syntactic  Signature: 

1  -  [np ,  v]  1  -  [np ,  v ,  np] 

1- [np , v ,np .adjective] 

1- [np , v ,np ,pp( against)] 

1- [np , v ,np ,pp(to)] 

1- [np , v ,np ,pp(with)]  1- [np , v ,poss ,np] 

1- [np , v , adv(easily)]  l-[n] 

0- [np , v ,np ,pp(with)]  0- [np , v , self] 

0- [np , v , self ,pp(on)]  0- [np , v ,pp(at)] 

Figure  1:  Syntactic  Signature  for  Change  of  State  - 
break  subclass 

3.2  Experiment  1:  Class-based  Approach 

In  the  first  experiment,  we  attempt  to  discover 
whether  each  syntactic  signature  uniquely  identifies 
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a  single  semantic  class.  The  outline  for  this  class- 
based  experiment  is  as  follows: 

1.  Automatically  extract  syntactic  information 
from  the  example  sentences  to  yield  the  syn¬ 
tactic  signature  for  the  class. 

2.  Discover  which  semantic  classes  have  uniquely- 
identifying  syntactic  signatures. 

When  we  parsed  the  1668  example  sentences  in 
Part  Two  of  Levin’s  book  (including  the  negative  ex¬ 
amples),  these  sentences  reduce  to  282  unique  pat¬ 
terns.  The  191  sets  of  sentences  listed  with  each 
of  the  191  semantic  classes  in  turn  reduces  to  189 
unique  syntactic  signatures.  187  of  them  uniquely 
identify  a  semantic  class,  meaning  that  97.9%  of 
the  classes  have  uniquely  identifying  syntactic  signa¬ 
tures.  As  it  turns  out,  only  two  classes  do  not  have 
enough  syntactic  information  to  distinguish  them 
uniquely. 

Because  we  were  interested  in  the  role  of  preposi¬ 
tions  in  the  signatures,  we  also  ran  the  experiment 
with  two  different  parse  types:  ones  that  ignored 
the  actual  prepositions  in  the  pp’s,  and  parses  that 
threw  away  all  information  except  for  the  values  of 
the  prepositions.  Interestingly,  we  still  got  useful  re¬ 
sults  with  these  impoverished  parses,  although  fewer 
semantic  classes  had  uniquely-identifying  syntactic 
signatures  under  these  conditions.  These  results  are 
shown  in  Figure  2. 

We  note  that  the  use  of  negative  examples,  i.e. , 
plausible  uses  of  the  verb  in  contexts  which  are  dis¬ 
allowed,  was  a  key  component  of  this  experiment. 
There  are  1082  positive  examples  and  586  negative 
examples.  Although  this  evidence  is  useful,  it  is  not 
available  in  dictionaries,  corpora,  or  other  conve¬ 
nient  resources  that  could  be  used  to  extend  Levin’s 
classification.  Thus,  to  extend  our  approach  to  novel 
word  senses  (i.e.,  words  not  occurring  in  Levin),  we 
would  not  be  able  to  use  negative  evidence.  For 
this  reason,  we  felt  it  necessary  to  determine  the  im¬ 
portance  of  negative  evidence  for  building  uniquely 
identifying  syntactic  signatures.  As  one  might  ex¬ 
pect,  throwing  out  the  negative  evidence  degrades 
the  usefulness  of  the  signatures  across  the  board. 
The  best  result,  using  only  the  positive  evidence  to 
identify  semantic  classes,  gives  88.0%  of  the  seman¬ 
tic  classes  uniquely  identifying  syntactic  signatures. 
See  Figure  2  for  the  full  results. 

3.3  Experiment  2:  Verb-based  Approach 

In  this  experiment,  we  abstracted  away  from  word 
sense  distinctions  and  considered  each  verb  only 
once,  regardless  of  whether  it  occurred  in  multiple 
classes.  In  fact,  46%  appear  more  than  once.  In 
some  cases,  the  verb  appears  to  have  a  related  sense 
even  though  it  appears  in  different  classes.  For  ex¬ 
ample,  the  verb  roll  appears  in  two  subclasses  of 
Manner  of  Motion  Verbs  that  are  distinguished  on 
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Figure  2:  Overall  Results 


the  basis  of  whether  the  grammatical  subject  is  an¬ 
imate  or  inanimate.  In  other  cases,  the  verb  may 
have  (largely)  unrelated  senses.  For  example,  the 
verb  move  is  both  a  Manner  of  Motion  verb  and 
verb  of  Psychological  State. 

The  composition  of  a  syntactic  signature  is  differ¬ 
ent  for  this  experiment.  Here,  we  collect  all  of  the 
syntactic  patterns  associated  with  every  class  a  par¬ 
ticular  verb  appears  in,  regardless  of  whether  that 
verb  is  semantically  related  in  the  different  classes. 
Now  a  syntactic  signature  is  the  union  of  the  frames 
extracted  from  every  example  sentence  for  each  verb. 
The  outline  of  the  verb-based  experiment  is  as  fol¬ 
lows: 

1.  Automatically  extract  syntactic  information 
from  the  example  sentences. 

2.  Group  the  verbs  according  to  their  syntactic 
signature. 

3.  See  where  the  two  ways  of  grouping  verbs  over¬ 
lap: 

(a)  the  semantic  classification  given  by  Levin. 

(b)  the  syntactic  classification  based  on  the 
derived  syntactic  signatures. 

To  return  to  the  Change  of  State  verbs,  we  now 
consider  the  syntactic  signature  of  the  verb  break, 
rather  than  the  signature  of  the  semantic  class  as  a 
unit.  The  verb  break  belongs  not  only  to  the  Change 
of  State  class,  but  also  four  other  classes:  10.6  Cheat, 
23.2  Split,  40.8.3  Hurt,  and  48.1.1  Appear.  Each 
of  these  classes  is  characterized  syntactically  with  a 
set  of  sentences.  The  union  of  the  syntactic  patterns 
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corresponding  to  these  sentences  forms  the  syntactic 
signature  for  the  verb.  So  although  the  signature 
for  the  Change  of  State  class  had  13  frames,  the 
verb  break  has  39  frames  from  the  other  classes  it 
appears  in. 

One  way  to  view  the  difference  between  this  ex¬ 
periment  and  the  previous  one  is  the  difference  be¬ 
tween  the  intension  of  a  function  versus  its  exten¬ 
sion.  In  this  case,  we  are  interested  in  the  func¬ 
tions  that  group  the  verbs  syntactically  and  seman¬ 
tically.  Intensionally  speaking,  the  definition  of  the 
function  that  groups  verbs  semantically  would  have 
something  to  do  with  the  actual  meaning  of  the 
verbs.3  Likewise,  the  intension  of  the  function  that 
groups  verbs  syntactically  would  be  defined  in  terms 
of  something  strictly  syntactic,  such  as  subcatego¬ 
rization  frames.  But  the  intensions  of  these  func¬ 
tions  are  matters  of  significant  theoretical  investi¬ 
gation,  and  although  much  has  been  accomplished 
in  this  area,  the  question  of  mapping  syntax  to  se¬ 
mantics  and  vice  versa  is  an  open  research  topic. 
Therefore,  we  can  turn  to  the  extensions  of  the  func¬ 
tions:  the  actual  groupings  of  verbs,  based  on  these 
two  separate  criteria.  The  semantic  extensions  are 
sets  of  verb  tokens,  and  likewise,  the  syntactic  ex¬ 
tensions  are  sets  of  verb  tokens.  To  the  extent  that 
these  functions  map  between  syntax  and  semantics 
intensionally,  they  will  pick  out  the  same  verbs  ex- 
tensionally. 

So  for  the  verb-based  experiment,  we  need  a  dif¬ 
ferent  methodology  to  establish  relatedness  between 
the  syntactic  signatures  and  the  semantic  classes, 
since  the  signatures  are  now  mediated  by  the  verbs 
themselves.  A  direct  method  is  to  compare  the  two 
orthogonal  groupings  of  the  inventory  of  verbs:  the 
semantic  classes  defined  by  Levin  and  the  sets  of 
verbs  that  correspond  to  each  of  the  derived  syntac¬ 
tic  signatures.  When  these  two  groupings  overlap, 
we  have  discovered  a  mapping  from  the  syntax  of  the 
verbs  to  their  semantics.  More  specifically,  let  us  de¬ 
fine  the  overlap  index  as  the  number  of  overlapping 
verbs  divided  by  the  average  of  the  number  of  verbs 
in  the  semantic  class  and  the  number  of  verbs  in  the 
syntactic  signature.  Thus  an  overlap  index  of  1.00  is 
a  complete  overlap  and  an  overlap  of  0  is  completely 
disjoint.  In  this  experiment,  the  sets  of  verbs  with  a 
high  overlap  index  are  of  interest. 

If  we  use  the  class-based  syntactic  signatures  con¬ 
taining  preposition-marked  pp’s  and  both  positive 
and  negative  evidence,  the  1668  example  sentences 
reduce  to  282  syntactic  patterns,  just  as  before.  But 
now  there  are  748  verb-based  syntactic  signatures, 
as  compared  with  189  class-based  signatures  from 
before.  Since  there  are  far  more  syntactic  signatures 


3  An  example  of  the  intensional  characterization  of  the 
Levin  classes  are  the  definitions  of  Lexical  Conceptual 
Structures  which  correspond  to  each  of  Levin’s  semantic 
classes.  See  (Dorr  and  Voss,  to  appear). 


than  the  191  semantic  classes,  it  is  clear  that  the 
mapping  between  signatures  and  semantic  classes  is 
not  direct.  Only  12  mappings  have  complete  over¬ 
laps.  That  means  6.3%  of  the  191  semantic  classes 
have  a  complete  overlap  with  a  syntactic  signature. 

4  The  Role  of  Word-Sense 
Disambiguation 

In  the  class-based  experiment,  we  counted  the  per¬ 
centage  of  semantic  classes  that  had  uniquely  iden¬ 
tifying  signatures.  In  the  verb-based  experiment,  we 
counted  the  number  of  perfect  overlaps  (i.e. ,  index 
of  1.00)  between  the  verbs  as  grouped  in  the  seman¬ 
tic  classes  and  grouped  by  syntactic  signature.  The 
overall  results  of  the  suite  of  experiments,  illustrat¬ 
ing  the  role  of  disambiguation,  negative  evidence, 
and  prepositions,  is  shown  in  Figure  2.  There  were 
three  ways  of  treating  prepositions:  (i)  mark  the  pp 
with  the  preposition,  (ii)  ignore  the  preposition,  and 
(iii)  keep  only  the  prepositions.  For  these  different 
strategies,  we  see  the  percentage  of  perfect  overlaps, 
as  well  as  both  the  median  and  mean  overlap  ra¬ 
tios  for  each  experiment.  These  data  show  that  the 
most  important  factor  in  the  experiments  is  word- 
sense  disambiguation. 

5  Semantic  Classification  of  Novel 
Words 

As  we  saw  above,  word  sense  disambiguation  is  crit¬ 
ical  to  the  success  of  any  lexical  acquisition  algo¬ 
rithm.  The  Levin-based  verbs  are  already  disam¬ 
biguated  by  virtue  of  their  membership  in  different 
classes.  The  difficulty,  then,  is  to  disambiguate  and 
classify  verbs  that  do  not  occur  in  Levin.  Our  cur¬ 
rent  direction  is  to  make  use  of  the  results  of  the  first 
two  experiments,  i.e.,  the  relation  between  syntactic 
patterns  and  semantic  classes,  but  to  use  two  addi¬ 
tional  techniques  for  disambiguation  and  classifica¬ 
tion  of  non-Levin  verbs:  (1)  extraction  of  synonym 
sets  provided  in  WordNet  (Miller,  1985),  an  online 
lexical  database  containing  thesaurus-like  relations 
such  as  synonymy;  and  (2)  selection  of  appropriate 
synonyms  based  on  correlations  between  syntactic 
information  in  Longman’s  Dictionary  of  Contempo¬ 
rary  English  (LDOCE)  (Procter,  1978)  and  semantic 
classes  in  Levin.  The  basic  idea  is  to  first  deter¬ 
mine  the  most  likely  candidates  for  semantic  classi¬ 
fication  of  a  verb  by  examining  the  verb’s  synonym 
sets,  many  of  which  intersect  directly  with  the  verbs 
classified  by  Levin.  The  “closest”  synonyms  are  then 
selected  from  these  sets  by  comparing  the  LDOCE 
grammar  codes  of  the  unknown  word  with  those  as¬ 
sociated  with  each  synonym  candidate.  The  use  of 
LDOCE  as  a  syntactic  filter  on  the  semantics  de¬ 
rived  from  WordNet  is  the  key  to  resolving  word- 
sense  ambiguity  during  the  acquisition  process.  The 
full  acquisition  algorithm  is  given  in  figure  3. 
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Given  a  verb,  check  Levin  class. 

1.  If  in  Levin,  classify  directly. 

2.  If  not  in  Levin,  find  synonym  set  from  Word- 
Net. 

(a)  If  synonym  in  Levin,  select  the  class 
that  has  the  closest  match  with  canonical 
LDOCE  codes. 

(b)  If  no  synonyms  in  Levin  or  canoni¬ 
cal  LDOCE  codes  are  completely  mis¬ 
matched,  hypothesize  new  class. 

Figure  3:  Algorithm  for  Semantic  Classification  of 
Novel  Words 

Note  that  this  algorithm  assumes  that  there  is  a 
“canonical”  set  of  LDOCE  codes  for  each  of  Levin’s 
semantic  classes.  Figure  4  describes  the  significance 
of  a  subset  of  the  syntactic  codes  in  LDOCE.  (The 
total  number  of  codes  is  174.)  We  have  developed 
a  relation  between  LDOCE  codes  and  Levin  classes, 
in  much  the  same  way  that  we  associated  syntactic 
signatures  with  the  semantic  classes  in  the  earlier 
experiments.  These  canonical  codes  are  for  syntac¬ 
tic  filtering  (checking  for  the  closest  match)  in  the 
classification  algorithm. 

As  an  example  of  how  the  word-sense  disambigua¬ 
tion  process  and  classification  ,  consider  the  non- 
Levin  verb  attempt.  The  LDOCE  specification  for 
this  verb  is:  T1  T3  T4  WV5  N.  Using  the  syn¬ 
onymy  feature  of  WordNet,  the  algorithm  automat¬ 
ically  extracts  five  candidate  classes  associated  with 
the  synonyms  of  this  word:  (1)  Class  29.6  “Masquer¬ 
ade  Verbs”  (act),  (2)  Class  29.8  “Captain  Verbs” 
(pioneer),  (3)  Class  31.1  “Amuse  Verbs”  (try),  (4) 
Class  35.6  “Ferret  Verbs”  (seek),  and  (5)  Class  55.2 
“Complete  Verbs”  (initiate).  The  synonyms  for  each 
of  these  classes  have  the  following  LDOCE  encod¬ 
ings,  respectively:  (1)  I  I-FOR  I-ON  I-UPON  LI  L9 
T1  N;  (2)  L9  T1  N;  (3)  I  T1  T3  T4  WV4  N;  (4)  I 
I-AFTER  I-FOR  T1  T3;  and  (5)  T1  Tl-INTO  N. 
The  largest  intersection  with  the  syntactic  codes  for 
attempt  occurs  with  the  verb  try  (T1  T3  T4  N). 
However,  Levin’s  class  31.1  is  not  the  correct  class 
for  attempt  since  this  sense  of  try  has  a  “negative 
amuse”  meaning  (e.g.,  John’s  behavior  tried  my  pa¬ 
tience.  In  fact,  the  codes  T1  T3  T4  are  not  part 
of  the  canonical  class-code  mapping  associated  with 
class  31.1.  Thus,  attempt  falls  under  case  2(b)  of  the 
algorithm,  and  a  new  class  is  hypothesized.  This  is 
a  case  where  word-sense  disambiguation  has  allowed 
us  to  classify  a  new  word  and  to  enhance  Levin’s 
verb  classification  by  adding  a  new  class  to  the  word 
try  as  well.  In  our  experiments,  our  algorithm  found 
several  additional  non-Levin  verbs  that  fell  into  this 
newly  hypothesized  class,  including  aspire,  attempt, 
dare,  decide,  desire,  elect,  need,  and  swear. 

We  have  automatically  classified  10,000  “un¬ 


known”  verbs,  i.e. ,  those  not  occurring  in  the  Levin 
classification,  using  this  technique.  These  verbs 
are  taken  from  English  “glosses”  (i.e.,  translations) 
provided  in  bilingual  dictionaries  for  Spanish  and 
Arabic.4  As  a  preliminary  measure  of  success,  we 
picked  out  84  LDOCE  control  vocabulary  verbs,  (i.e., 
primitive  words  used  for  defining  dictionary  entries) 
and  hand-checked  our  results.  We  found  that  69 
verbs  were  classified  correctly,  i.e.,  82%  accuracy. 

6  Summary 

We  have  conducted  two  experiments  with  the  intent 
of  addressing  the  issue  of  word-sense  ambiguity  in 
extraction  from  machine-readable  resources  for  the 
construction  of  large-scale  knowledge  sources.  The 
first  experiment  attempted  to  determine  a  relation¬ 
ship  between  a  semantic  class  and  the  syntactic  in¬ 
formation  associated  with  each  class.  Not  surpris¬ 
ingly,  but  not  insignificantly,  this  relationship  was 
very  clear,  since  this  experiment  avoided  the  prob¬ 
lem  of  word  sense  ambiguity.  In  the  second  exper¬ 
iment,  verbs  that  appeared  in  different  classes  col¬ 
lected  the  syntactic  information  from  each  class  it 
appeared  in.  Therefore,  the  syntactic  signature  was 
composed  from  all  of  the  example  sentences  from  ev¬ 
ery  class  the  verb  appeared  in.  In  some  cases,  the 
verbs  were  semantically  unrelated  and  consequently 
the  mapping  from  syntax  to  semantics  was  muddied. 
These  experiments  served  to  validate  Levin’s  claim 
that  verb  semantics  and  syntactic  behavior  are  pre¬ 
dictably  related  and  also  demonstrated  that  a  signif¬ 
icant  component  of  any  lexical  acquisition  program 
is  the  ability  to  perform  word-sense  disambiguation. 

We  have  used  the  results  of  our  first  two  experi¬ 
ments  to  help  in  constructing  and  augmenting  online 
dictionaries  for  novel  verb  senses.  We  have  used  the 
same  syntactic  signatures  to  categorize  new  verbs 
into  Levin’s  classes  on  the  basis  of  WordNet  and 
LDOCE.  We  are  currently  porting  these  results  to 
new  languages  using  online  bilingual  lexicons. 
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LDOCE  Code 

Arguments 

Adjuncts 

Example 

I 

— 

— 

Olivier  is  acting  tonight 

I-AFTER 

— 

PP  [after] 

She  sought  after  the  truth 

I-FOR 

— 

P  P  [for  J 

They  sought  for  the  right  one 

I-ON 

— 

PP[onJ 

He  acted  on  our  suggestion 

I-UPON 

— 

PP[uponJ 

The  drug  acted  upon  the  pain 

LI 

NP 

— 

He  acts  the  experienced  man 

L9 

ADV/PP 

— 

The  play  acts  well 

T1 

NP 

— 

I  pioneered  the  new  land 

Tl-INTO 

NP 

PP[intoJ 

We  initiated  him  into  the  group 

T3 

VP[to+inf| 

— 

He  tried  to  do  it 

T4 

VP[+progJ 

— 

She  tried  eating  the  new  food 

WV4 

-ing  adjectival 

— 

I’ve  had  a  trying  day 

WV5 

-ed  adjectival 

— 

He  was  convicted  for  attempted  murder 

N  (denominal  verb) 

— 

— 

pioneer  (noun) 

Figure  4:  Sample  Syntactic  Codes  used  in  LDOCE 


Note 

This  report  is  published  as  a  technical  report  jointly 
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